Crime Prediction in San Fransisco

نویسندگان

  • Patrick Phaneuf
  • Dorothy Yen
  • Sean Grady
چکیده

In June 2015, Kaggle began a competition named “San Francisco Crime Classification”[8], ending in June 2016. The competition’s dataset caught our attention due the subject being very tangible, with crime being at the forefront of modern media and to San Francisco being culturally significant due to its current tech industry. The dataset is also described by geographic and temporal features, therefore enabling potentially interesting visualizations. After our initial investigation of this dataset and the Kaggle competition, we realized that there was a large amount of accessible information on different ways of analyzing this very data set through blog posts and scripts published for this competition on Kaggle and Kaggle’s forums. Through the nature of Kaggle competitions, a means of evaluation is also provided for by the competition rules. We chose this dataset for these reasons. The “San Francisco Crime Classification” competition and its accompanying dataset, provided by SF OpenData, consists of 878,049 samples of crime reports from all neighborhoods of San Francisco spanning from January 2003 to May 2015. The data is initially split by Kaggle into two sets: the training and testing set. Odd numbered weeks (1, 3, 5, 7, . . . ) are put in the training set, and even numbered weeks (2, 4, 6, 8, . . . ) are put in the test set. The fields in each sample point are given in table 1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RoboCop: Crime Classification and Prediction in San Francisco

In this paper, we employ machine learning and other statistical techniques to the problems of classifying and predicting crimes in San Francisco. Drawing upon existing research in the field to approach these two problems, we employ Random Forest and VAR(p) models, respectively. For the classification problem, our results across all 39 crime categories demonstrate the difficulty of the fully-spe...

متن کامل

Objective Language Feature Analysis in Children with Neurodevelopmental Disorders During Autism Assessment

Fig 2 : Overview of the classifier system. Best estimate clinical diagnosis used as ground truth. Objective Language Feature Analysis in Children with Neurodevelopmental Disorders during Autism Assessment Manoj Kumar, Rahul Gupta, Daniel Bone, Nikolaos Malandrakis, Somer Bishop, Shrikanth Narayanan Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles Department...

متن کامل

Peep show establishments, police activity, public place, and time: a study of secondary effects in San Diego, California.

An empirical study was undertaken in San Diego, California, to test assumptions made by the government and by conservative religious policy advocates that there is a greater incidence of crime in the vicinity of peep show establishments. We asked two questions: (a) Is criminal activity in San Diego particularly acute at peep show establishments compared to surrounding control locations? and (b)...

متن کامل

CS 229 Project Report: San Francisco Crime Classification

Different machine learning approaches were conceptualized and implemented for predicting the probabilities of crime categories for crimes reported in San Francisco. The crimes records used in the research are downloaded from a competition on Kaggle. A Bayesian model, a mixture of Guassians model (stratified and unstratified), and logistic regression are implemented. A satisfactory result was ac...

متن کامل

San Francisco Crime Classification

San Francisco Crime Classification is an online competition administered by Kaggle Inc. The competition aims at predicting the future crimes based on a given set of geographical and time-based features. In this paper, I achieved a an accuracy that ranks at top %18, as of May 19th, 2016. I will explore the data, and explain in details the tools I used to achieve that result.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015